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ABSTRACT 

With the continuous increase of online services as well as 
energy costs, energy consumption becomes a significant cost 
factor for the evaluation of data center operations. A sig- 
nificant contributor to that is the performance of database 
servers which are found to constitute the backbone of on- 
line services. From a software approach, while a set of novel 
data management technologies appear in the market e.g. 
key-value based or in-memory databases, classic relational 
database management systems (RDBMS) are still widely 
used. In addition from a hardware perspective, the majori- 
ty of database servers is still using standard magnetic hard 
drives (HDDs) instead of solid state drives (SSDs) due to 
lower cost of storage per gigabyte, disregarding the perfor- 
mance boost that might be given due to high cost. 

In this study we focus on a software based assessment of the 
energy consumption of a database server by running three 
different and complete database workloads namely TCP-H, 
Star Schema Benchmark -SSB as well a modified benchmark 
we have derived for this study called W22. We profile the 
energy distribution among the most important server com- 
ponents and by using different resource allocation we assess 
the energy consumption of a typical open source RDBMS 
(PostgreSQL) on a standard server in relation with its per- 
formance (measured by query time) . 

Results confirm the well-known fact that even for complete 
workloads, optimization of the RDBMS results to lower ener- 
gy consumption. 



Categories and Subject Descriptors 

H. 2.4 [Database management]: Systems — Query process- 
ing; H.3.4 [Information storage and retrieval]: Systems 
and software — performance evaluation ( effiency and effec- 
tiveness) 
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I. INTRODUCTION 

A general assumption is that efficient programs are also ener- 
gy efficient [8]. This seems to be reasonable since a program 
requiring less computation time will probably also require 
less energy for its execution. This can be true if we assume 
a constant energy use of a computer, and the computer per- 
forms more tasks in the same amount of time because of 
time efficient programs. 

However, a more detailed analysis may show a different pic- 
ture. For example it is a well-known fact that one complexity 
measure like time can be optimized at the costs of another 
complexity measure of space. An optimization following this 
path may require that the increased amount of data for a 
program may be transferred from one type of memory to a 
different type of memory. In a modern computer this might 
result that the data may be moved from the CPU cache to 
the main memory or from main memory to a drive. An 
obvious consequence of this transfer are longer access times 
which may already reduce the benefit of these optimization 
strategies. In addition the usage of a different memory type 
may result in higher energy costs [14]. For example, this can 
be the case if a large data structure needs to be generated 
on an external drive and the access to this drive would need 
to be frequently used in comparison to a solution where the 
drive may be even stopped to save energy. 

Since the amount of information to be processed has drasti- 
cally increased over the last couple of years, large and very 
large data centers which are expected to support internet 
operation have been built by major players (e.g. Google 



and Amazon). However, increased costs of operation (elec- 
tricity, cooling costs) have made this vendors to offer the 
rental of unneeded data center capacity (disk storage space, 
computation time) to nearly everyone who is willing to pay. 
This gives rise to the concept of "cloud computing" where, 
for example, server capacities are only needed for a limited 
time or where the acquisition of an own data center is too 
expensive^. 

Furthermore, in the context of database applications another 
issue might arise: the distribution of data across a database 
cluster as a basis for distributed computing, as it happens in 
the case of Big Data/ Hadoop clusters[l]. Industry trend^J 
advocate that increased demand for server performance will 
be fulfilled by deploying more (database) servers and data 
centers. But with increased energy costs, data centers ope- 
rators are looking for mechanisms to avoid or reduce those 
costs (scale-up strategies). This way the energy efficiency of 
a single database server becomes important as it can be a 
crucial component for the overall cost assessment of a data 
center operation where energy costs are the most significant 
factor on their operation and scalability. 

1.1 Background and motivation 

Several approaches which can be found in the literature 
have addressed the issue of energy efficiency of database 
servers from different aspects. Harizopoulos et. aZ[l3] as well 
as Grae/e[5] divided energy efficiency improvements into a 
hardware and a software part. Recent work such as the one 
by Baroso and /f672;Ze[3] has revealed the fact that the im- 
pact of hardware improvements on the energy efficiency is 
quite low. A main approach in that direction is the use of 
Solid State Drives (SSD's) which have been shown to im- 
prove performance on typical database operations such as 
sorting and therefore energy efficiency [2]. Obviously SSDs 
are faster in such kind of scenarios [To], but there are other 
tradeoffs as a replacement for HDDs with most obvious the 
cost per gigabyte (Schall et. aZZ[ll], Schroder-Preikschat et. 
aZZ[l2], Harder et. all^\). 

On the other side, software improvements seem to be more 
attractive when it comes to increasing the energy efficiency 
of a database server, especially for relational databases. Lang 
and PateZ[l6], for example, proposed a database query re- 
ordering technique to influence the energy consumption of 
the database server. Xu et. all 20 proposed a modification 
of the query planner in order to take the estimated energy 
consumption into consideration. 

Tsirogiannis et. a/Z [14] had a very detailed investigation re- 
garding the relation between performance and energy con- 
sumption in a RDBMS. For that they did extensive measure- 
ments with various hardware configurations that are typi- 
cally found in a database server for a scale-out scenario. In 
addition to this, they identified performance tradeoffs for 
different SQL query operators. 

1.2 Objective of this study 

This study provides the basis for a more detailed analysis 
of the energy consumption in the context of database ap- 
plications that are common in daily workloads of enterprise 

1 State of the data center 2011 by Emerson 



users. For this purpose the paper does not only consider 
the total power consumption of the full system but also the 
usage of energy by different components. In addition the pa- 
per analyses standard energy saving strategies coming with 
a modern computer system and their impact on the power 
consumption. 

In the context of database applications, it is important to 
know which database operation or which combination of 
SQL operators affects which of the measured components. 
This study analyzes these combinations and their fraction 
of the overall power consumption. 

All the mentioned aspects lead to the final view on the 
energy effiency of a single database server. This is impor- 
tant when it comes to a scale-out scenario that is typically 
found in a database cluster. There are two choices: either 
one decreases the energy consumption of the used hardware 
equipment with a small decrease of the database response 
time, or increases the performance while accepting a slight 
increase of the energy consumption. Both ways improve the 
energy effiency. 

As suggested by Xm [18] , more experimental research has 
to be done concerning the energy consumption of database 
servers. Taking this into account, this study tries to have 
a closer look on both choices mentioned above and to give 
recommendations how to improve the energy effiency of a 
single database server in a general way (for example the us- 
age of buffers and indices combined with traditional HDDs 
as the primary storage for the database files). 

2. MEASUREMENT METHODOLOGY 
2.1 Test server preparation 

For the purpose of this study, we constructed a database 
server making use of recent technical core components. The 
operating system selected for testing was a typical Linux 
distribution ( Ubuntu Server version 11.10) using a stable 
kernel (kernel version: 3.0.0.17). In order to eliminate bi- 
ases from the operating system in our measurements, all 
unnecessary operating system services were turned off. The 
database management system (DBMS) that was used was 
PostgreSQL version 9.1. The hardware characteristics of 
the test server are provided in table [I] 



CPU 


Intel Core i7-860 @2.8 GHz 


Main memory 


3x Samsung DDR-2 2 GBytc 800 MHz 
(m378b5673fh0-ch9) 


Hard drives 


3x Hitachi 1 TByte, 16 MByte cache, 
(HDT721010SLA360) 



Table 1: Hardware characteristics of the test server 
used in our study 



Two of the three hard drives were combined as a striping 
RAID array in order to boost performance and to separate 
the filesystem calls of the DBMS from the operating system. 
The operating system itself was installed on the remaining 
third hard drive. 

We identified the core components we were interested in 
their power consumption throughout the various tests as 



follows: (a) CPU, (b) main memory, (c) motherboard as a 
whole and (d) the hard drives of the RAID array. These 
core components were used as a unity of analysis in order to 
assess the optimization options. Additionally we were inter- 
ested in the impact of the operating system settings as well 
as the test server capabilities on the energy consumption of 
the core components. Appendix [A] provides the complete 
test arrangement. 

Besides, the test server's motherboard used Intel's ElSfj^] 
In general EIST enables and disables CPU cores or reduces 
and raises the overall CPU clock frequency as a function of 
the CPU usage. This also affects other technical aspects, 
e.g. the heat dissipation from the CPU. 

Recent Linux kernels offer several modules providing more 
data to the frequency scaling heuristics. The most handful 
modules for our test server configuration are the on-demand 
and the power-saving modules. Taking this into account, 
the command line tool powerto^] was used to suggest con- 
figurations on the disablement of operating system services 
that might influence energy consumption. 
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2.2 Stress tests on energy consumption 

Before assessing the energy efficiency as a whole as well as 
the components we were interested in profiling their energy 
consumption, we performed stress tests to get a detailed 
overview of the energy distribution among the single com- 
ponents. Figure [T] summarizes the stress tests. Each tuple 
of the shown bars represents a stress test for a core compo- 
nent. The left bar of a specific tuple illustrates the energy 
consumption of each component in conjunction with the on- 
demand frequency scaling module. The right bar of the tuple 
displays the energy consumption of all energy saving set- 
tings enabled, respectively. Figure [l] also shows the energy 
consumption of the power supply unit (PSU) which is the 
difference between the overall power consumption of the test 
server and the one of the measured components. 

Our first action was to calibrate the test procedure by ana- 
lyzing the energy consumption of the core hardware compo- 
nents as a basis for the other tests. All services and appli- 
cations not required for the operating system as well as the 
DBMS service were turned off. The operating system uses 
default settings, e.g. the usage of the on-demand CPU fre- 
quency scaling module. The average consumption per core 
component is shown in figure [T] in the left bar of the tuple 
named Idle. The right bar of this tuple shows the energy 
consumption with all available energy saving settings turned 
on. There is a slight increase of the power consumption: al- 
though the CPU is forced to reach the energy saving idle 
states, it is constantly interrupted by doing so, for example 
by administrative background processes of the operating sys- 
tem. Changing the CPU state costs some effort but results 
in higher energy consumption. 

To measure the power consumption stressing the CPU of 
our test server, we decided to use the Linux command line 

2 Acrynym for Intel Enhanced Speedstep. It allows the sys- 
tem to dynamically adjust the CPU frequency and voltage 
to decrease energy consumption and heat production. 
3 Refer to http: //www. lesswatts . org/projects/powertop/ 



Figure 1: Energy consumption per component dur- 
ing our calibration stress test 



tool burnMMX because it utilizes all parts of the CPU in- 
cluding extension like MMX or AVX. Based on the eight 
CPU cores reported to the operating system, we ran our 
CPU stress test eight times and increased the number of 
running burnMMX instances accordingly. As assumed the 
power consumption increases up to the four real cores and 
remains relatively constant if more cores were used due to 
HI *\ This behavior corresponds to the CPU tests reported 
in 14]. The CPU stress test with the highest energy con- 
sumption for both scenarios (energy saving settings turned 
off and on) is depicted in figure [I] as the bar tuple labeled 
CPU for comparison with the other stress tests. 

To stress test the hard drives in the RAID array we executed 
the I/O benchmark suite iozone while observing the power 
consumption throughout the different benchmark tests (dif- 
ferent access strategies, buffered and un-buffered data access 
and so on) . Our test results are displayed in figure[l]in the in 
the second bar tupel labeled Disks. Our test shows that the 
impact on the energy consumption is low when all energy 
savings settings are turned on. However, in contrast to the 
power consumption of current SSDs[12] the power consump- 
tion of the hard drives is two to three times higher. 

Finally we stressed the main memory with the command 
line tool memtester that uses different patterns to access the 
main memory. It also tests for the correctness of the contents 
by writing, reading and comparing the main memory areas. 
The effect on the power lane supplying the main memory 
was not measurable. In contrast to this, the left bar of the 
tuple named Main memory in figure [l] indicates an increase 
of the overall power consumption of 30 Watts in which the 
CPU is responsible for. Even with all energy saving settings 



4 HT is an acronym for Hyperthreadmg by Intel. It is used 
to improve parallelization of computation. 



turned on[19], the overall energy consumption was higher 
than in idle state. 

2.3 Measuring energy efficiency 

We define the performance ratio (P) of a single database 
query as the unit of time to execute the query in time (t) to 



obtain the results 15 



(1) 



We then normalize a set of performance values to values 
between zero and one as follows: 

1 

Pnormalized,i — Pi 



max(P) 

We then define the energy efficiency EE as the ratio between 
the performance P of the database query and the electrical 
work W executing the query: 

EE = w (2) 

We then normalize the set of efficiency values to values be- 
tween zero and one as follows: 

1 
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2.4 Selection of workloads and DBMS param- 
eters 

As aforementioned our intuition here is to examine the ef- 
fects of executing a database query in relation to the overall 
power consumption. For example a combination of a not 
well formed SQL query and an optimized query planner can 
cause a cascade of operations that consumes a lot of unnec- 
essary power, e.g. sequential scans cause unnecessary hard 
drive accesses with all its overhead in the operating system 
(access control, swapping and so on). 



According to related work, for example 14 or [3], we identi- 
fied several settings for our PostgreSQL database server we 
hypothesized they had an important impact on the power 
consumption. Those settings can be divided into two groups: 
the first one are the settings for the underlying operating 
system and for PostgreSQL and the second one are settings 
for the database itself. In detail those settings are: 

• The size of the main memory assigned to PostgreSQL 
to operate 

• The size of the miscellaneous buffers, e.g. for sorting 
resulting rows or caching 

• The settings for the query planner 

• The size of the data in the database 

• The session type of executing subsequent queries (sin- 
gle session vs. multi session) 

• Combinations of the different SQL operators and func- 
tions, for example increasing number of joined tables 
or the number of result set dimensions to be restricted 

To get comparable results we decided to run three com- 
plete database benchmark workloads: TPC-i/[9], the Star 
schema benchmark (SSB[7]) and a third benchmark con- 
structed specifically for this study, we call it the W22 bench- 
mark. The first two offer a standardized way for comparison 



whereas the last one is a workload we composed to analyze 
the behavior of PostgreSQL not covered by the previously 
mentioned workloads. 

3. WORKLOAD RESULTS 
3.1 TPC-H workload 

For the TPC-H workload we used their data generator to 
generate three databases with a size of 1, 5 and 10 GByte of 
data as well as all suggested indices. We chose this database 
sizes because we wanted the databases to fit completely, 
nearly and under no circumstances in the working main 
memory for PostgreSQL by assigning 1, 2.5 and 5 GByte. 
This selection was made due to the limitation of 6 GByte 
overall main memory present in the database server. In a fi- 
nal setup step we optimized the internal data structures and 
the query planner statistics of PostgreSQL for the created 
databases. 

At first we ran the benchmark queries subsequently using 
a new database connection to avoid PostgreSQL's internal 
cach^] After this we retried the TPC-H benchmark using a 
single database connection for the SQL queries. The average 
power consumption per component for both test series is 
shown in figure [2] 
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Figure 2: Energy consumption per component for 
TPC-H workload 



5 Notice that PostgreSQL's implementation isolates client 
connections completely by running the client sessions in 
shared-nothing processes. The client processes only share 
some IPC memory for synchronization purposes and the 
ACID functionality. This means that every client has its 
own cache. 



This figure clearly indicates the higher energy consumption 
when a single session is used. We recognized that the average 
overall power consumption for all of our TPC-H tests does 
not vary a lot. Please compare only the bars which are 
identically labeled with each other. We assumed that the 
CPU and hard drives are the main energy consumer but in 
fact it turned out that the mainboard is the biggest one. 

We also assumed that the usage of single database connec- 
tion would improve the performance because of the better 
usage of PostgreSQVs internal cache but the opposite oc- 
cured: the performance was nearly the same with an in- 
creased power consumption of 7 percent on average. Using 
the equations outlined in section |2.3[ this leads to a lower 
energy effiency of nearly 8 percent on average as depicted in 
figure [3] 



Finally we were interested in performance versus energy 
effiency ratio as stated in the conclusion of f 4 . This ratio 
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Figure 3: Energy effiency for TPC-H workload 



A detailed study of the query plans reveals a broad usage 
of sequential scans on the TPC-H database tables as well 
as a low usage of the internal caches. Therefore we modi- 
fied PostgreSQL's settings regarding the query planner and 
caches to favor the provided indices and repeated the tests. 
This has an inverted effect: the performance decreases by 
about 4 percent on average. The reason is the overhead to 
process the indices which are also disk bounded. 

In general we observed a big effect on the energy effiency 
by assigning a higher portion of the main memory to Post- 
greSQL. This results in massive swapping for some TPC-H 
queries, e.g. f , 9 and 21, and the suspension of the Post- 
greSQL process. In fact, PostgreSQL is very disk bound and 
by accessing the database files the operating system swaps 
heavily because of the reduced main memory portion. Be- 
sides, swapping does effect only the hard drive and not the 
CPU. This affects the overall power consumption for our 
TPC-H tests and explains the small variation of the values. 



for our entire TPC-H tests with all its different configura- 
tions is shown in figure [4] and confirms the strong relation 
between performance and energy effiency (the most energy- 
effiencent configuration is typically the highest performing 
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Figure 4: Performance vs. energy effiency for TPC- 
H workload 



This figure also shows that the majority of the configurations 
are clustered in the lower left area. This means that most 
of the queries of the TPC-H benchmark show a poor energy 
effiency. 

3.2 Star schema workload 

The Star schema benchmark (SSB) was initially composed 
to get a database layout closer to reality compared to TPC-H 
which it is based on. According to [7] several original tables 
were decoupled to make many join operations unnecessary 
and the set of SQL queries of SSB were created to be more 
realistic and to test the database capabilities regarding range 
coverage and indice usage. 

For executing the SSB SQL queries we used the same param- 
eters as described in section [3~T| we generated three different 
databases with 1, 5 and 10 GByte of data and created test 
configurations for PostgreSQL with 1, 2.5 and 5 GByte main 
memory to work on. 

Just as for the TPC-H workload, we ran the SSB database 
queries subsequently by using a single and multiple database 
connections. The average energy consumption per com- 
ponent is depicted in figure [5] The type of accessing the 
database matters: although the average performance is near- 
ly the same, the average power consumption for multiple 
connections is lower than the one for using a single connec- 
tion. The latter consumed nearly 10 percent more energy 
on average. Except for the database size of 1 GByte, this 



results in a lower energy effiency. 
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Figure 5: Energy consumption per component for 
the SSB workload 



1 GByte main memory 



2.5 GByte main memory 



CD 

■C ' 

.Ol" 

-c 

OJS 



ft 



5 GByte main memory 



DB connection type 
■ multiple 
□ single 



i s 



Database size in GByte 



Figure 6: Energy effiency for SSB workload 



interfering the results. Figure [6] shows the energy effiency of 
our SSB tests. 

A deeper investigation of the query plans showed us that 
the reduced set of tables and the modified queries lead to a 
more stable and predictable behaviour concerning the power 
consumption and energy effiency. In particular this can be 
seen in figure[6] the energy effiency for a given constant SSB 
database size does not vary a lot and is relativly independent 
from the amount of main memory spent for PostgreSQL. 

Besides, the energy effiency of the SSB database with 10 
GByte is remarkably lower than the ones with 1 and 5 GByte 
of data. An investigation of the logged system activities dur- 
ing the tests revealed heavy swapping actions which caused 
the suspension of the PostgreSQL process handling the que- 
ries. As illustrated in figure [5] the low CPU energy con- 
sumption indicates the heavy swapping activity. 

The query plans also show the general use of sequential scans 
on the SSB database tables. In fact, no indices were involved 
to execute the queries in all configurations. The execution 
of the queries purely relies on the read performance of the 
database files. As figure [7] indicates, it does not matter if 
the result set of the joint tables are further restricted. In 
addition to this, this figure shows that main memory spent 
for buffers is relativly unimportant: if one consider a specific 
database size, the row scan per second rate does not vary a 
lot. This means that the mentioned rate is independent of 
the main memory fraction assigned to PostgreSQL and also 
independent of the number of dimensions the result set is 
restricted. 
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In general and in terms of energy effiency, the SSB bench- 
mark performs better than TPC-H. In addition to this, SSB 
organizes its SQL queries into groups in which the requested 
rows do not overlap and the group number indicates on how 
many dimensions the result set has to be restricted. This 
allows a better analysis of the results and avoids side effects 



Figure 7: SSB row scans per second 



The performance vs. energy effiency ratio for all SSB test 
configurations is illustrated in figure [8] The same strong 
relationship can be seen as in figure [4] for the TPC-H bench- 
mark tests. In contrast to TPC-H, the vast majority of the 
SSB test configurations is clustered in the upper right corner 
of figure |U This means that the SSB configurations perform 
much better. This leads to a better energy effiency. 
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3.3 W22 workload 

To get a more in-detailed look at the impact of the differ- 
ent SQL operations and their combinations we composed a 
database workload called W22. The workload consists of six 
groups of database queries: 

1. aggregate functions (count, avg and sum) 

2. grouping (GROUP BY) 

3. sorting (ORDERED BY) 

4. selecting different data types, e.g. int, varchar, text 
and date 

5. removing duplicates (DISTINCT) 

6. joins (cross joins, conditional joins) 

In contrast to this, the TPC-H and the SSB workload are 
designed for benchmark an OLAP scenario. Their database 
queries use a mix of different SQL operations coming from 
the groups above. Our W22 workload instead tries to ana- 
lyze the impact of SQL queries in each group. Besides, this 
workload allows to analyze the behaviour of PostgreSQL's 
internal query optimizer and query planner as well as the in- 
teraction with the underlaying operating system. The W22 
queries operate on the TPC-H databases described in section 

I3~i1 



3.3.1 Aggregate functions 

Our firsts tests with aggregate functions, e.g. avgO, sum() 
and count () , show in particular the impact of the filesystem 
cache of the underlaying operating system. 



All of the mentioned aggregate functions cause PostgreSQL 
to perform a sequential scan. The first query that was exe- 
cuted (count (*)) had a notably longer execution time com- 
pared to the next ones (avgO and sumO). The query plans 
for all three queries are the same. After a closer investi- 
gation of the logged activities of the operating system, we 
identified the filesystem cache as the performance booster by 
caching the database files that PostgreSQL uses. In this case 
the kind of executing the queries (single session vs. multi 
session) does not matter. The most important setting is the 
fraction of main memory assigned to PostgreSQL: the lower 
the fraction the more main memory is available for system 
caches. 

The effect of file system caches provided by the operating 
system is depicted in figure [9] for the mentioned aggregate 
functions. Due to the longer execution time the energy con- 
sumption was higher resulting in a lower energy effiency. 
Subsequent queries benefited of the cache. 
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The suggested optimizations of the queries, e.g. the use of 
an indexed column for count () (count (column) instead of 
count (*)), did not change the execution times. 

3.3.2 Grouping and sorting 

The queries of group 2 (grouping) and group 3 (sorting) 
expands the count () query from group 1 with SQL opera- 
tors like ORDER BY and GROUP BY to get comparable results. 
Those queries were composed to study the impact of the 
mentioned SQL operators in reaction of the previous tested 
benchmarks TCP-H and SSB. 

The tests result indicates only a slight difference in terms of 
execution times and energy consumption compared to the 
test results of group 1. The impact on the energy consump- 
tion of the CPU and the main memory was not measurable. 
As a result the calculated energy effiency remains the same. 



3.3.3 Selection of different data types 

According to our tests, the selection of a specific data type, 
e.g. VARCHAR, DATE and INTEGER, has no impact on the query 
execution time. 

The other purpose of the queries of group 4 was to test 
the operators of the different data types with and with- 
out having an index on the particular table column. We 
observed a big impact on the execution time when an in- 
dex was used whereas the average energy consumption was 
slightly higher. The presence of an index in combination of 
an operatoi^] supporting this index lead to an immense per- 
formance gain. Therefore the energy efhency is quite high 
compared to a sequential scan. In contrast to the queries 
with an absent index, the queries with an involved index 
showed an almost linear performance. This is crucial since 
we used the largest table of TPC-H, lineitem, which also 
has the greatest amount of rows throughout all queries of 
this group. 

PostgreSQL uses an index for a query when a) the index 
supports the operator of the query and b) the costs for pro- 
cessing the index are lower than sequentially scanning the 
table. Those costs are composed of customizable base costs 
and dynamic cost estimations that PostgreSQL gathers pe- 
riodically and statistically from all tables of a database. 

We modified the settings for the base costs of loading and 
processing a database and an index page (usually 8 KBytes 
of data) to favor the usage of indices, e.g. if the database 
and indice files are stored on different storage devices with 
different access speeds (the indice files are usually stored on 
the faster one). Our test results remain the same because 
the data and index files are stored on the same hard drive 
device. 

3.3.4 Joining tables and eleminating duplicates 
Based on our TPC-H and SSB benchmark results, we were 
interested in the behaviour of PostgreSQL dealing with table 
joins. There are two kinds of joins supported in PostgreSQL: 
unconditional and conditional joins. 

Our first query of this W22 group deals with an uncondi- 
tional join of two tables where the cross product is further 
restricted by the conditions given after the WHERE clause (one 
restriction per involved table). We expected this query to 
be unperformant due to the cross product and the succes- 
sive restriction of the result set, but this was not the case. 
The query plan reveals the (unintentional) use of an index 
for one restriction and a sequence scan for the other one. 
So the results (performance, energy consumption and the 
energy effiency) are the same as mentioned in the last sec- 
tion although a sequence scan is part of the query plan. 



For example, the operators greater than, less than and 
equals (<, > and =) are valid for B-tree indices on numeric 
table columns in PostgreSQL. There are other index types, 
e.g. inverted index, for other column data types as well as 
their specialized operators. 



The other queries of this W22 group were composed to join 
two TPC-H tables using innejjand equi-joinaj to examine 
differences in the query plans. 

Interestingly those queries indicate the same characteristics 
in terms of the query planner. For PostgreSQL it does not 
matter where the condition for joining two tables is placed. 
In other words, PostgreSQL does not distinguish between an 
inner, implicit or equi-join. 

Besides, the queries are composed to join two TCP-H tables 
with different number of rows to investigate the join perfor- 
mance. The first two queries joined the lineitem table with 
the orders and the part table, respectivly. The last query 
joined the orders with the customers table. Please refer to 
table [2] to get an overview of the number of rows. Finally we 
select the amount of joined rows by using the count (*) ag- 
gregate function. This forces PostgreSQL to use sequential 
scans for the mentioned tables. 





Database size 


Table 


1 GByte 


5 Gbyte 


10 GBytc 


lineitem 


6.001.215 


299.999.795 


599.986.052 


orders 


1.500.000 


7.500.000 


15.000.000 


part 


200.000 


1.000.000 


2.000.000 


customer 


150.000 


750.000 


1.500.000 



Table 2: Number of rows for some TPC-H tables 

As assumed, our test results indicate that the join perfor- 
mance is strongly related to the row scan performance. The 
test results are similar to the ones of our TPC-H and SSB 
benchmarks. This means they do not resemble much in their 
energy consumption but in their execution times. Unsurpris- 
ingly the lower the amount of rows to be scanned for join- 
ing, the lower is the execution time and therefore the energy 
effiency is quite better. Although compared with query with 
the unconditional join mentioned above, the performance is 
fundamentally worse. 

We were also interested in the effects of eleminating du- 
plicates from a result set. For this purpose we formed a 
test query using the DISTINCTO clause on a column of the 
lineitem table not having a supporting index. The query 
plan revealed a sequence scan and the removal of duplicates 
by hashing the values. Again, the test results showed the 
same characteristics as all of our database tests performing 
a sequence scan. 

4. SUMMARY 

At first, our tests with our database server using normal 
HDDs indicates an energy increase of roughly 9 W when the 
HDDs are fully utilized. Compared to the energy consump- 
tion of the other measured core components during the tests, 
this is insignificant. The argument, SSDs should be prefered 
because they consume up to 12 times less energy compared 
to HDDs, is invalid in this context. 

7 The SQL standard standardized an inner join as <table 
a> [INNER] JOIN Ctable b> WHERE <a.xyz> = <b.xyz>. 
8 An equi-join is in the form <table a> JOIN <table b> ON 
<a.xyz> = <b.xyz>. The SQL standard allows shorthand 
for the column to join by using the USING clause. 



As Lang et all. stated in the summary of [17] , evaluating 
the energy effiency of a DBMS needs the inclusion of en- 
tire workloads, not just single queries. This study makes 
use of three different and complete workloads that allows 
are more comprehensive look at the energy effiency of a re- 
lational DBMS. Most of the benchmark queries caused a 
massive usage of sequential scans. This implies that the se- 
quential read performance is an extremly important factor 
that affects the energy consumption. Actual SSDs clearly 
outperform normal HDDs but in this case enterprise grade 
HDDs can be used because they offer nearly the same per- 
formance as SSDs. 

As mentioned in the introductionary section of this paper, 
there are more factors and not only technical parameters 
that influence the performance and thus the energy effiency 
of a database server. 

For example, the filesystem cache provided by the operating 
system is more relevant for the execution of a database query 
in PostgreSQL than its internal cache. Based on our exper- 
iments, we recommend not to assign more than 50 percent 
of the main memory to PostgreSQL for operations. With 
more assigned main memory the remaining processes of the 
operating system are forced to use the remaining portion. 
This causes the operating system to swap this portion to 
the hard drive which leads to a dramatic reduction of the 
performance. 

Another important factor for the energy effiency of the used 
database benchmark is the kind of accessing PostgreSQL 
(single vs. multiple database connections) . Our assumption, 
subsequent queries of the benchmarks would benefit from 
PostgreSQL's internal cache by using just a single database 
connection, does not come true. In fact, the opposite per- 
formed better. 

Our tests also indicate the fact that energy saving settings 
are counterproductive for a database server that is reason- 
ably utilized because it decreases the overall system perfor- 
mance. 

5. LIMITATIONS AND FUTURE WORK 

Since our tests were based on PostgreSQL as the DBMS, the 
results can not be purely adopted to other enterprise data 
management systems. PostgreSQL was chosen because its 
source code is freely available and a good starting point for 
academic research. However, the basic database technolo- 
gies like indices and data access patterns can be found in 
almost any other DBMS. Currently a newer version of Post- 
greSQ^ is available. Its release notes announces a better 
usage of indices and in general a higher performance. In 
the future all tests could be redone to analyze the impact of 
software improvements on the energy effiency. 

Another limitation on this was the strict use of HDDs. While 
a number of studies have reported the optimization in per- 
formance by the use of SSDs, this study considered only the 
use of HDDs as a basis since these are still common in dat- 
acenter operations. A future research task could consider 
a benchmarking scenario making a hybrid use of SSDs and 
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HDDs where SSDs could be used to improve access to indices 
and HDDs for the storage of the raw data. 
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APPENDIX 

A. MEASUREMENT ARRANGEMENT 
AND INSTRUMENTATION 

To measure the energy consumption of the mainboard we 



modified the ATX power cord as depicted in figure 10(a 
the wires of related pins were multiplexed into single cables 
and demultiplexed on the opposite side to its original con- 
figuration. The purpose is to have an easier and accurate 
way measuring the voltage drop of these cables. 



between 1 and 6. Each group stands for a power lane to be 
measured. Table|3]shows the power lanes and the pin group. 

To measure the power consumption of the mentioned com- 
ponents, six digital multi meters (Uni-T UT61-B with an 
average accuracy of ± 0.8 percent) and an energy logger 
( Voltcraft energy logger 4000 with an accuracy ± 1 percent) 
for the overall power consumption were applied. The bene- 
fit of this test arrangement is a very accurate measurement 
compared to other techniques such as power clamp meters. 



Name 


Voltage 


provides 


Pin group 




Main 


power supply (ATX) 




3.3V 


+3.3 V 


Mainboard (main 
memory) 


1 


5V 


+5 V 


Mainboard 


2 


12V1 


+12 V 


Mainboard 


3 


ATX 12 connector 


12V2 


+12 V 


CPU 


4 


Molex adapter 




+5 V 


internal drives 


5 




+12 V 


internal drives 


6 



Table 3: Power lanes 
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Figure 10: Pins used for measurements 



The same applies for the energy consumption of the CPU 
where the additional ATX 12 cable was modified as shown in 
figure [To (b) To measure the energy consumption of the two 
hard drives in the RAID array we used a Molex Y-splitter 
adapter to get the combined consumption and modified it 



respectivly. This is shown in figure 10(c] 



In general, figure [To] shows the pin usage. Related pins are 
grouped into pin groups which are represented by a number 



